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Abstract 

Undirected graphs are often used to describe 
high dimensional distributions. Under spar- 
sity conditions, the graph can be estimated us- 
ing l\ penalization methods. However, cur- 
rent methods assume that the data are inde- 
pendent and identically distributed. If the dis- 
tribution, and hence the graph, evolves over 
time then the data are not longer identically 
distributed. In this paper, we show how to es- 
timate the sequence of graphs for non-identically 
distributed data, where the distribution evolves 
over time. 



1 Introduction 

Let Z — (Z\. . . . , Z p ) T be a random vector with dis- 
tribution P. The distribution can be represented by an 
undirected graph G — (V, F). The vertex set V has one 
vertex for each component of the vector Z. The edge set 
F consists of pairs (j, k) that are joined by an edge. If 
Zj is independent of Zk given the other variables, then 
(j, k) is not in F. When Z is Gaussian, missing edges 
correspond to zeroes in the inverse covariance matrix 
Suppose we have independent, identically dis- 
tributed data D = (Z 1 , .... Z', ... . Z n ) from P. When 
p is small, the graph may be estimated from D by test- 
ing which partial correlations are not significantly differ- 
ent from zero |DP04|. When p is large, estimating G is 
much more difficult. However, if the graph is sparse and 
the data are Gaussian, then several methods can success- 
fully estimate G ; see MMB06I IBGd08l IFHT071 ILF071 
lBL58llRBLZ07l . 

All these methods assume that the graphical struc- 
ture is stable over time. But it is easy to imagine cases 
where such stability would fail. For example, Z l could 
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represent a large vector of stock prices at time t. The 
conditional independence structure between stocks could 
easily change over time. Another example is gene ex- 
pression levels. As a cell moves through its metabolic 
cycle, the conditional independence relations between 
proteins could change. 

In this paper we develop a nonparametric method 
for estimating time varying graphical structure for mul- 
tivariate Gaussian distributions using l\ regularization 
method. We show that, as long as the covariances change 
smoothly over time, we can estimate the covariance ma- 
trix well (in predictive risk) even when p is large. We 
make the following theoretical contributions: (i) non- 
parametric predictive risk consistency and rate of con- 
vergence of the covariance matrices, (ii) consistency and 
rate of convergence in Frobenius norm of the inverse 
covariance matrix, (iii) large deviation results for co- 
variance matrices for non-identically distributed obser- 
vations, and (iv) conditions that guarantee smoothness 
of the covariances. In addition, we provide simulation 
evidence that we can recover graphical structure. We 
believe these are the first such results on time varying 
undirected graphs. 

2 The Model and Method 

Let Z l ~ 7V(0, E(i)) be independent. It will be useful 
to index time as t = 0, 2/n, . . . , 1 and thus the data 
are D n — (Z l : t = 0, 1/n, . . . , 1). Associated with 
each each Z* is its undirected graph G(t). Under the 
assumption that the law C(Z t ) of Z l changes smoothly, 
we estimate the graph sequence G(1),G(2),...,. The 
graph G(t) is determined by the zeroes of £(i) . This 
method can be used to investigate a simple time series 
model of the form: W° ~ N(0, S(0)), and 

W t = W 1 ' 1 + Z\ where Z 1 ~ iV(0, £(*)). 

Ultimately, we are interested in the general time series 
model where the Z*'s are dependent and the graphs change 
over time. For simplicity, however, we assume indepen- 
dence but allow the graphs to change. Indeed, it is the 
changing graph, rather than the dependence, that is the 
biggest hurdle to deal with. 

In the iid case, recent work IBGd08l IFHT07 1 has 
considered l\ -penalized maximum likelihood estimators 



over the entire set of positive definite matrices, 

E n = argmin {tr(E _1 5 n ) + log |E| + A|E _1 |i} (1) 

where S n is the sample covariance matrix. In the non-iid 
case our approach is to estimate E(t) at time t by 

£„(i) = argmin {tr^- 1 ^^)) + log |E| +AIE- 1 !!} 



where S n (t) 
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is a weighted covariance matrix, with weights w s t 



(2) 



K 



s-t\ 



given by a symmetric nonnegative function 



kernel over time; in other words, S n (t) is just the ker- 
nel estimator of the covariance at time t. An attraction 
of this approach is that it can use existing software for 
covariance estimation in the iid setting. 

2.1 Notation 

We use the following notation throughout the rest of the 
paper. For any matrix W = (toy), let \W\ denote the 
determinant of W, tr ( W) the trace of W. Let (p m& ^{W) 
and (p m in(W) be the largest and smallest eigenvalues, 
respectively. We write — diag(W) for a diagonal 
matrix with the same diagonal as W, and — W — 
W^. The matrix Frobenius norm is given by ||W||jr = 

J2iJ2j w "ij- The operator norm \\W\\\ is given by 

<Pma,x(WW T ). We write | • |i for the l\ norm of a ma- 
trix vectorized, i.e., for a matrix \W\\ = HvecWHj — 
2~2 i 2~2j \ w ij\> anc l write || W|| for the number of non- 
zero entries in the matrix. We use 8(t) = E _1 (t). 

3 Risk Consistency 

In this section we define the loss and risk. Consider es- 
timates E„(i) and G n {t) = (V, F n ). The first risk func- 
tion is 

U(G(t),G n (t)) = EL(G(t),G n (t)) (3) 

where L(G(t),G n (t)) = F(t) A F n (t) , that is, the 

size of the symmetric difference between two edge sets. 

We say that G n (t) is sparsistent if U(G(t),G n (t)) 
as n — > oo. 

The second risk is defined as follows. Let Z ~ 
iV(0, Eq) an d let E be a positive definite matrix. Let 



i?(E) =tr(E- 1 E )+log|E| 
Note that, up to an additive constant, 

i?(E) = -2£ (log/ s (Z)), 



where f s is t he dens ity for N(0, E). We say that G n (t) 
is persistent |GR04| with respect to a class of positive 

^ p 
definite matrices S n if i?(E„) — minsgs n -R(E) — ► 0. 

In the iid case, l\ regularization yields a persistent esti- 
mator, as we now show. 



The maximum likelihood estimate minimizes 

i?„(E) -tr(E- 1 5„) + log|E|, 

where S n is the sample covariance matrix. Minimizing 
i?„(E) without constraints gives E n = S n . We would 
like to minimize i?„(E) subject to ||E _1 ||o < L. This 
would give the "best" sparse graph G, but it is not a 
convex optimization problem. Hence we estimate E„ 
by solving a convex relaxation problem as written in (Q]) 
instead. Algorithms for carrying out this optimization 
are given by QBGd081lFHT07ll . Given L n , Vra, let 

5„ = {E:E^0,|E- 1 | 1 < L n }. (5) 

We define the oracle estimator and write ([TJ as (J7J) 

E*(n) = arg min i?(E), (6) 

E„ = arg min i?„(E). (7) 

Note that one can choose to only penalize off-diagonal 
elements of E _1 as in [RBLZ07|, if desired. We have 
the following result, whose proof appears in Section [3~2l 



Theorem 1 Suppose that p n < rfi for some £ > and 



( n 



1/2 



for @. Then for the sequence of empirical estimators 
as defined in and E* (n), Vn as in ©, 

i?(E„) - i?(E») 4 0. 

3.1 Risk Consistency for the Non-identical Case 

In the non-iid case we estimate E(t) at time t G [0, 1]. 
Given £(t), let 

i?„(E(t)) =tr(E(i)- 1 5„(i))+log|E(i)|. 

For a given l\ bound L n , we define E n (t) as the mini- 
mizer of i? n (E) subject to E G S n , 

E„(t) = arg min {tr(£- x <?„(*)) + log |E|} (8) 

where S n (t) is given in (O, with K(-) a symmetric non- 
negative function with compact support: 

(4) Al The kernel function K has a bounded support [—1,1]. 

Lemma 2 Let E(t) = [ojfc(i)]. Suppose the following 
conditions hold: 

1. There exists Cq > 0, C such that max sup t |0j fe (t)| 
< Co and max^j sup f \<j" k {t)\ < C. 

2. p n < for some £ > 0. 

3. h n x n- 1 / 3 . 



Then max ijfe \S„(t,j,k) - T,(t,j,k)\ = O p f^^jfp) 
for all t > 0. 

Proof: By the triangle inequality, 

\S n (t,j,k)-X(t,j,k)\ < \S n (tJ,k)-ES n (t,j,k)\+ 

\ES n (t,j,k)-E(t,j,k)\. 
In Lemma [l4lwe show that 

maxsup|E5 n (f, j,k) - E(t,j,k)\ = O(C h n ). 

3,k t 

In Lemma [15] we show that 

p(\S n {t,j,k)-Ed n (t,j,k))\ >ej < exp {- Cl h n ne 2 } 
for some c\ > 0. Hence, 

P (max\S n (t,j,k) --ES n (t,j,k)\ > e^j < 
exp{~nh n (Ce 2 - 2£ log n/(nh n ))} and (9) 



m & ^ k \S n (t,j, k) - ES n (t,j,k)\ = O p 
Hence the result holds for h n x rt -1 / 3 . □ 

With the use of Lemma|2j the proof of the following 
follows the same lines as that of Theorem[T] 

Theorem 3 Suppose all conditions in Lemma\2\and the 
following hold: 



i„ = o[n 1 / 3 /v / log^j ■ (10) 
Then, Vt > 0,/or the sequence of estimators as in (0, 

i2(E„(t))-.R(E*(i))£o. 

Remark 4 If a local linear smoother is substituted for 
a kernel smoother, the rate can be improved from n 1 / 3 
to n 2 / 5 as the bias will be bounded as 0(h 2 ) in ( 13. Q . 

Remark 5 Suppose that \fi, j, ifOij 7^ 0, we have 6ij = 
0(1). Then Condition (llOl l allows that = L n ; 

hence if p = and £ < 1/3, we have that \\Q\\ — 
Q(p). Hence the family of graphs that we can guaran- 
tee persistency for, although sparse, is likely to include 
connected graphs, for example, when £l(p) edges were 
formed randomly among p nodes. 

The smoothness condition in Lemma [2] is expressed in 
terms of the elements of E(£) = [o»_j(t)]. It might be 
more natural to impose smoothness on Q(t) — E(£) _1 
instead. In fact, smoothness of 8t implies smoothness 
of Et as the next result shows. Let us first specify two 
assumptions. We use af(x) as a shorthand for aa{x). 



A2 There exists some constant So < 00 such that 

max sup |<7j(£)| < So < 00, hence (11) 

»=i-.Pte[o,i] 



max \\o~i 

i—l....p 



< S Q . 



(12) 



A3 Let 9ij (t) , Vi, j, be twice differentiable functions such 
that 0L(£) < 00 and 9"At) < 00, V£ £ [0, 1]. In addi- 
tion, there exist constants S\ , S2 < 00 such that 

su p EEEX>«(*)Wi ^ 5i (13) 

*e[o,i] fc= i fc i i= i j= i 

su P EEi^wi ^ ^ 14 > 

*e[o,i] fe=1 ^ =1 
where the first inequality guarantees that 

sup t6[0ll] ELi E?=i < VST < 00. 

Lemma 7 Denote the elements of Q(t) = E(£) _1 by 
0jk(t). Under A\2\and A\3\ the smoothness condition in 
Lemma\2\holds. 

The proof is in Section [6] In Section [7] we show some 
preliminary results on achieving upper bounds on quan- 
tities that appear in Condition Q] of Lemma [2] through 
the sparsity level of the inverse covariance matrix, i.e., 

||e t || OJ vte[o,i]. 

3.2 Proof of TheoremQ] 

Note that Vn, sup Se5n \R(£) - _R„(E)| < 

^Ie-, 1 ! \S n (j,k)-Z (j,k)\<S n \^-\, 
where it follows from IRBLZ071 that 



Definition 6 For a function u : [0, 1] 

SU Pz G [0,l] \ U ( X )l 



R, let \\u\\ 



S n = max\S n (j,k) - E (i,fe)| = P (y / logp/n). 

/ \l/2 

Hence, minimizing over S n with L n = o I lp " J 
sup Se5n |i?(E) - Rn(E)\ = op(l). By the definitions 
of E*(n) € S n and E n 6 <S„, we immediately have 

i?(E*(n)) < i?(E„) and_R„(E„) < fl n (E*(n)); thus 

< #(E n )-fl(E*(n)) 

= i?(E n )-i?„(E„) + i?„(E„)-i?(E*(n)) 
< fl(E n ) - Rn(T, n ) + fin(E*(n)) - #(E*(n)) 

Using the triangle inequality and E„, E* (n) € iS n , 

-i?(E*(n)) I < 
|i?(E„) - i?„(E„) + i?„(E*(n)) - i?(E*(n))| 

< \R(Z n ) - i?„(E„)| + |i?n(E*(n)) - fl(E*(n))| 

< 2 sup |ii(E) — i?„(E)|. ThusVe > 0, 

se5„ 



the event j R(E n ) — R(E*(n)) > ej is contained in 
the event |sup Se5n \R(Y.) - Rn(E)\ > e/2|. Thus, 
for L r , 



o((n/ logn) 1 / 2 ), and Ve > 0, as n 



oo, 



p(su Pse5 Ji?(E)-i?„(S)| >e/2 



M < 
0. □ 



4 Frobenius Norm Consistency 

In this section, we show an explicit convergence rate 
in the Frobenius norm for estimating 0(t),Vi, where 
p,\F\ grow with n, so long as the covariances change 
smoothly over t. Note that certain smoothness assump- 
tions on a matrix W would guarantee the corresponding 
smoothness conditions on its inverse W -1 , so long as 
W is non-singular, as we show in Section [6] We first 
write our time-varying estimator n (i) for at 
time t <G [0, 1] as the minimizer of the l\ regularized 
negative smoothed log-likelihood over the entire set of 
positive definite matrices, 

Q n (t) = argmm{tr(0S n (t))-log|0|+A n |0|i} (15) 

where A„ is a non-negative regularization parameter, and 
S n (t) is the smoothed sample covariance matrix using a 
kernel function as defined in (|2). 

Now fix a point of interest to- In the following, we 
use Eo = (to)) to denote the true covariance matrix 
at this time. Let ©o = S^" 1 be its inverse matrix. Define 
the set S = : %(*o) + 0, % ^ j}. Then |5| = s. 

Note that \S\is twice the number of edges in the graph 
G(to). We make the following assumptions. 

A4 Letp + s = o (rt( 2 / 3 )/ logn) and tp min (So) > k > 
0, hence i/5 ma x(@o) < 1/fc. For some sufficiently large 

{p+s) log n 
^273 



constant Ad, let cp m i n (©o) = Q 2M 



The proof draws upon techniques from [RBLZ07 1, with 
modifications necessary to handle the fact that we pe- 
nalize \Q\ 1 rather than |0^|i as in their case. 

Theorem 8 Let 0„ (t) be the minimizer defined by (1 151 . 
Suppose all conditions in Lemma\2\and A^\hold. If 



logn 



then 



\\e n (t)-e \\ F = o P [2M ^ + ^ n j . (16) 

Proof: Let be a matrix with all entries being zero. Let 

Q(0) = tr(0S„(i o ))-log|0|+A|0|- 

tr(e 5 n (to)) + iog|e |-A|eo|i 

= tr((e-e )(5 n (i)-E )) - 

(iog|e|-io g |eo|) + tr((e-e )Eo) 

+ A(|0|i-|0 o |i). (17) 



minimizes Q(0), or equivalently A„ = — Oo min- 
imizes G(A) = Q(Q Q + A). Hence G(0) = and 
G(0 rl ) < G(0) — by definition. Define for some 
constant C\, 6 n = CiJ 1 - 2 ^. Now, let 



A„ = ^J^ = ^ for some < e < 1. (18) 
Consider now the set 

T„ - {A : A = B - e , S, O >- 0, || A|| F = Mr„}, 
where 



^W ^ff^ x^^O. (19) 



Claim 9 Under A\4\ for all A E T n such that \\A\\ F 
0(1) as /« (UD, 0o + vA y 0, Vu e / d [0, 1]. 

Proof: It is sufficient to show that ©o + (1 
and O - eA >- for some 1 > e > 

(6 + (l + e)A) > (0o)-(l + e 
for e < 1, given that <ys m i n (0o) = 0(2Mr n ) and 
||A|| 2 < ||A|| F = Mr n . Similarly, (^ min (0 o - eA) > 
cp mi n(0o)-e||A|| 2 > Ofore < 1. □ 

Thus we have that logdet(©o + vA) is infinitely 
differentiable on the open interval / D [0, 1] of v. This 
allows us to use the Taylor's formula with integral re- 
mainder to obtain the following lemma: 

l/n c for some c > 2, 



e)A >- 
Indeed, 

l|A|| 2 > 



Lemma 10 With probability 1 
G(A) > for all A e %. 

Proof: Let us use A as a shorthand for 

vecA T ^ (1 - u)(0o + vAy 1 <g> (0 O + vAy 1 dv^j vecA, 

where gj is the Kronecker product (if W = (iVij) mxn , 

P = (bke) P xq, then W <8> P = (wijP) mpxnq ), and 
2 

vecA £ R p is Ap X p vectorized. Now, the Taylor ex- 
pansion gives 

log |0 O + A| - log |0 O | = £ log |0 O + vA\ |„ =0 A + 

loi 1 ~ «)i^logdet(0 o + vA)dv = tr(E„A) + A, 
where by symmetry, tr(S A) = tr(0 — @o)Xo- Hence 

G(A) = (20) 

A + tr (a(S„ - S )) + A n (|0 O + A| x - |0 o |i) . 

For an index set S and a matrix W — [tVij], write Ws = 
(wijl((i,j) £ S)), where /(■) is an indicator function. 
Recall S = : O y ^ 0, i ^ j} and let S c = 

{(i,j) : O y = 0, i ^ j}. Hence = + + 
©5c , V© in our notation. Note that we have ©^ ?c = 0, 

|e ° 



AO 


1 




l©o°s + 


Allx + IA^Ii 


©0° 


1 






hence 


©0° 


1 


> 


lA° 1 


- lA°l 


&o 


1 


> 


-|A^|i 





where the last two steps follow from the triangle in- 
equality. Therefore 



ie + an - leoU = 



|e v + A< 

> lA° I - 



l©o°li 



1©^ 



lA°l 



'oil 
(21) 



Now, fromLemma[2] maxj k \S n (t, j, k) — <r(t,j, k)\ = 
Op (^P) = P (5 n ). By®, with probability l-£ 

tr(A(S„-E )) <S n \A\ l7 hence by (ED 
(A(S n - So)) + A n (|6 + A|i - |6o|i) 



tr 



> 



> 



AIA^-^Ia!^ 



-A„|A^ 



\ IaO I x IaO| 
An l^S" 1 1 — ™ I s 1 1 



-(*» + An) (]^>|l + |A|| X ) + (A„ - 5n) \A%^ 



> -(5„ + A„)(|A^| 1 + |A^| 1 ), where 



Proof: Now by contradiction, suppose G(A') < for 
some A' G V„. Let A = |jf%;A'. Thus A = 60 + 
(1 - 6)6!, where < 1 - 6 = < 1 by definition 

of A . Hence A £ T n given that 9 + A >~ by 
Claim [T3l Hence by convexity of G(A), we have that 
G(A ) < 6G(0) + (1 - 9)G(A') < 0, contradicting 
that G(A ) > for A € T n . □ 

By Claim[l2]and the fact that G(A„) < G(0) = 0, 
we have the following: If G( A) > 0, VA G T n , then 

A„ g |T„ U V„), that is, ||A„|| F < Mr n , given that 

A„ = 0„ — Oo, where 0„, 6o >- 0. Therefore 

P (\\K\\f > Mr„) = 1 P (||A„|| F < Mr„) 
< l-P(G(A)>0,VAeT n ) 

= P(G(A) < for some A G T„) < — . 

n c 



(22) We thus establish that 1 1 A„ 1 1 F < O p ( Mr n ) . □ 



(5„ + A„)(|A^| 1 +|A|| 1 ) 

< (<5„ + A„)(Vp||A^||ir + Vi||A||| F ) 

< (^ + A„)(vp||A^|| f + ^||A0|| f ) 

< (J„ + A„)max{VP,\/i}(||A x || F + ||A c 

< (J„ + A„)max{Vp,V^}V2||A|| F 
1 + e 



< 6 n 



-Vp+V2||A||f- 



(23) 



Combining d20l ). d22b . and d23l . we have with probabil- 
ity 1 - i, for all A e T n , 

G(A) > A-^n + AOOA^Ii + lA^J 

2 * 1 + £ VP+W2||A|| F 



> 



-^—11 Alii -<s„ 

2 + r" " F e 



|A|| 2 F 



= l|A|| 2 F 



V2(l 



2 + t 

: fc 2 
2 + r 



" 4±\\f 

S n V2(l + e) 
eMr n 



> o 



for M sufficiently large, where the bound on A comes 
from Lemma[II]by | RBLZ07 1 . □ 



Claim 13 Let B be a p x p matrix. If B y and B + 
DyO, then B + vD y Ofor all v G [0, lj. 

Proof: We only need to check for v £ (0, 1), where 1 — 

v > 0; Vx £ R p , by B y and B + D y 0, x T Bx > 
and x T (B + D)x > 0; hence x T Dx > —x T Bx. Thus 

x T (B + vD)x = x T Bx+vx T Dx > (l~v)x T Bx > 0. 

□ 



5 Large Deviation Inequalities 

Before we go on, we explain the notation that we fol- 
low throughout this section. We switch notation from 
t to x and form a regression problem for non-iid data. 
Given an interval of [0,1], the point of interest is xq — 
1. We form a design matrix by sampling a set of n p- 
dimensional Gaussian random vectors Z t at t = 0, 1/n, 
2/n, . . . , 1, where Z* ~ AT(0, Et) are independently 
distributed. In this section, we index the random vectors 
Z with k = 0,1, ... ,n such that Zk = Z l for k = nt, 
with corresponding covariance matrix denoted by E&. 
Hence 



Z k 



(Zki, 



,Z kp f ~ iV(0,E fe ), Vfc. 



(24) 



Lemma 11 (fRBLZOTS) For some r = o(l), under AM 
vecA T ( /^(l - u)(6 + wA)- 1 ® (6 + wA)- 1 ^) vecA 

> IIAHI^, for all A G T n . 



These are independent but not identically distributed. 
We will need to generalize the usual inequalities. In 
Section|A] via a boxcar kernel function, we use moment 
generating functions to show that for E = — 2^L-=i ZkZ^, 



P n (|Ey-Ey(xo)| >€)<e- 



(25) 



We next show the following claim. 

Claim 12 If G(A) > 0,VA G T n , then G(A) > Ofor 
all A in V„ = {A : A = D - Q ,D y 0, ||A|| F > 
Mr n , for r n as in <TT9)}. Hence if G(A) > 0,VA G 
T„, f/ien G(A) > 0/ora// AeT„U V n . 



where P n = Pi X • • • X P n denotes the product measure. 
We look across n time-varying Gaussian vectors, and 

roughly, we compare Ey with E-y(iEo), where E(xo) = 
E„ is the covariance matrix in the end of the window 
for to = n. Furthermore, we derive inequalities in Sec- 
tion l5.1l for a general kernel function. 



5.1 Bounds For Kernel Smoothing 

In this section, we derive large deviation inequalities for 
the covariance matrix based on kernel regression estima- 
tions. Recall that we assume that the symmetric nonneg- 
ative kernel function K has a bounded support [—1,1] in 
AQ] This kernel has the property that: 



vK(v)dv < 2 / K(v)dv = 1 (26) 



v 2 K(v)dv < 1. 



(27) 



In order to estimate to, instead of taking an average of 
sample variances/covariances over the last n samples, 
we use the weighting scheme such that data close to to 
receives larger weights than those that are far away. Let 
= (<Tij(x)). Let us define iro = „ = 1, and 
Vi = 1, . . . , n, Xi = and 



nn 



x i — ^0 



(28) 



where the approximation is due to replacing the sum 
with the Riemann integral: 



Xi - x 



2 / K(v)dv = 1, 



due to the fact that K(v) has compact support in [—1, 1] 
and h < 1. Let = (ay (xk)) , Vfc = 1, . . . , n, where 
<Tij(xk) = cov(Z kil Z kj ) = Pij{x k )(j l (x k )(jj{xk) and 
Pij(xk) is the correlation coefficient between Zi and Zj 
at time x k . Recall that we have independent (Z k iZ k j) 
for all k = 1, . . . ,n such that 'E(ZkiZkj) = crij(x k ). 
Let 



n x — ' h 

k=l 



h 



<Jij{x k ), hence 



E /Ak{xo)Z ki Z k j = }Jk(xo)(Tij(x k ) = 

fe=i fe=i 
We thus decompose and bound for point of interest xq 



k=l 



< 



S,h(xo)ZkiZ k j - (Tij(x ) 

n 

~E**^2h{xo)Z kl Z k: j - <Tij(xo) 

k=l 

n n 

y^^k(xo)z k jZ k j - Ey^j k (x )ZkiZ k j 



fe=i 



k=l 



Lemma 14 Suppose there exists C > such that 
maxsup \a"(t,i, j)\ < C. Then 

i,3 t 

Vte [0,1], max\ES n (t,i,j) - aij(t)\ =0(h). 

Proof: W.l.o.g, let t = t , hence ES n (t, i, j) = 

We use the Riemann integral to approximate the sum, 



1 - 9 



n e — ' h 

fe=i 



x k - x 



<Jij(x k ) 



-K 



U — Xq 



(Jij{u)du 



K(v)o~ij(xo + hv)dv. 

-l/h 

We now use Taylor's Formula to replace cry (xq + hv) 
and obtain 2 §_ x i h ( x o + hv)dv = 

2j°_ lK {v) (a l] (x ) + hva^x ) + <^ Whv) 

= <7y (z ) + 2 f° x K(v) (hva'^xo) + 



2 

C(hvf 



dv 



dv, 



where 2 / K(v) I hva^Axo) 



C{hvf 



2ha' l Ax ) / vK(v)dv + 



2 

Ch 2 



dv 



v 2 K(v)dv 



Ch 



< ha'iAxo) H — , where y(v) — x < hv. 

Thus -cry (x ) =0(h). □ 

We now move on to the large deviation bound for all 
entries of the smoothed empirical covariance matrix. 

Lemma 15 Fore < ^feaM 

where C\ is defined in Claim\T8\for some C > 0, 



P[\S n (t,i,j)--ES n (t,i,j)\ >ej <exp{-C^e 2 }. 
Proof: Let us define A k = Z ki Z k j — cry (x^). 
p(\S n (t,i,j)-ES n (t,i,j)\ >e) 

Cn n \ 

y^ j £ k (xo)Z k iZkj - y^ y tk(xo)(Tij(x k ) > e 
k=l k=l I 

For every t > 0, we have by Markov's inequality 



(29) 



P I ^2n£ k (x )A k > 

\k=l 
= P 



}J k (xo)ZkiZ k j 



fe=i 



+ |$ 1 (i,j)-cry(ar )| 



< 



(30) 



Before we start our analysis on large deviations, we first Before we continue, for a given t, let us first define the 
look at the bias term. following quantities, where i , j are omitted from $ i ( i , j ) 



a k = fK(^jf^) {<Ji{x k )(J 3 {x k ) + <Tij(x k )) 
b k = fK(^=^) {a t {x k )(j {x k )-<7 l0 {x k )) thus 



2 

1 — 



By (03, (EB, LemmaHU for t < jj- 



• M = !Iinx,v | „ <7t(2Cfc)c7-3-(xjfe)) 

We now establish some convenient comparisons; see Sec- 
tion IB . 1 1 and IB . 2 I for their proofs. 

Claim 16 |j < ^-and^ < 2M 2 , where both equal- 
ities are established at pij (x k ) = 1, Vfc. 

Lemma 17 Forb k < a k < \, Vfc, \ YX=\ ln (i-a k )( 



vfe=l 



< 



h \ h 



Ee 



2 r-/ »t-»D 



(l-a fc )(l+6 fc ) 



To show the following, we first replace the sum with a 
Riemann integral, and then use Taylor's Formula to ap- 
proximate <Ti(x k ), <J 3 (x k ), and aij(x k ),yk = 1, . . . , n 
with <7j , dj o~ij and their first derivatives at xq respec- 
tively, plus some remainder terms; see Section 1531 for 
details. 

Claim 18 Forh = n~ e for some 1 > e > 0, there exists 
some constant C\ > such that 



*2(*,j) 



Ci(a?(x )o- 2 Ax Q ) + aUx )) 



LemmaTyicomputes the moment generating function for 
j-K ( Xk ~^ x ° ) Z k i ■ Z k j. The proof proceeds exactly as 
that of LemmallTlafter substituting t with { Xk ~ h x ° ) 
everywhere. 

Lemma 19 LetfK(^^) (H-/o ij -(s fc ))<T i (x fc )o- J -(s fc ) 
< l,Vfc. For b k < a k < 1. 

Ee f k (=* i =a)z M z„ = ((1 _ Qfe)(1 + 5fc)r i/2 _ 

Remark 20 77it« w/;en we se? i = -rl— , the bound on e 
implies that b k < a k < 1/2, Vfc: 

a k = f(l + pij (xk))(Ti(xk)(Tj(xk) 

s o + i \ r„ \ eo-i{x k )o-j{x k ) 1 
< 2ta i (x k )a J (x k ) = — < -. 

We can now finish showing the large deviation bound 

for maxjj \Sij — ESij\. Given that A\, . . . ,A n are 
independent, we have 

fc=l 

2t / X k — Xq 



fc=i 



o"ij(a;fc) 



fe=i 



(31) 



= e -nte-nt$i(i,j)+§ Efc=i!n (i-„ t j(i +lt) 

< exp f-n*e + n< 2 $ 2 + nf 3 $ 3 + ^ni 4 $ 4 ^ , 

where the last step is due to Remark [20] and Lemma [171 
Now let us consider taking t that minimizes 

exp (-nte + nt 2 § 2 + nt 3 § 3 + §77i 4 $ 4 ); Leti = jj-: 

^(-nte + nt 2 $ 2 + ^ 3 $3 + |nt 4 $ 4 ) < Now 

given that |r < if. Claim [T6l and [T8l 



vfc=l 



h \ h 



< exp ( — nte + nt~<fr 2 + nt $3 + -nt $4 



< exp 

< exp 



-ne 2 ne 2 ne 2 e$ 3 9 ne 2 e 2 <I> 4 



4$ 2 163>2 64$ 2 $| 5 256$ 2 $ 2 
-3ne 2> 



20$ 2 

< exp — 



3nhe 2 

2QC 1 {a 2 {xoWf{^)+al{xo)))- 
Finally, let's check the requirement one < jj, 

{C l {\+ pl{x Q ))a 2 {x )^(x Q )) /h 



e - 



(C 1 (l + p 2 J (xo))aUxo)a 2 (x )) 
:nnx,v , „ (2K <Ji{x k )aj {x k )) ' 



For completeness, we compute the moment generat- 
ing function for Z k ,iZ k: j. 



Lemma21 Let t(l + Pij(x k ))ai(x k )<jj(x k ) < l,Vfc, 
so that b k < a k < 1, omitting x k everywhere, 



(1 - t(ai(7j + CTy)(l + £(<7i<7j - (Ty)) 



1/2 



Proof: W.l.o.g., let i = 1 and j = 2. 

E (e tZlZ2 ) = E (E (e tZ2Zl \Z 2 )) 



Eexp 



= 1-2 



tp 12 ai t 2 a 2 (l-p 2 12 ) 



(T-2 



tp 12 ax t 2 al(l-p\ 2 ) 



a 2 



1 - {2t Pl2 a 1 a 2 + t 2 a 2 a 2 {l - p 2 12 )) 
1 



-1/2 



1/2 



1/2 



(1 - t(l + pi 2 )<ri(J 2 ){l + t(l - pi 2 )<7icr 2 ) 

where 2tp\ 2 o\o 2 + t 2 a\a 2 (l — p 2 2 ) < 1. This requires 
that t < (i +Pl 3) ericr2 which is equivalent to 2tpi 2 <ria 2 + 

t 2 a\a\{\ — p\ 2 ) — 1 < 0. One can check that if we 
require t(l + pi 2 )ai a 2 < 1, which implies that ta\a 2 < 
1 — tp\ 2 a\a 2 and hence t 2 afcr 2 < (1 — tpi 2 ai(j 2 ) 2 , the 
lemma holds. □ 

6 Smoothness and Sparsity of E t via S t 1 

In this section we show that if we assume 8(x) = {9ij (x)) 
are smooth and twice differentiable functions of x G 
[0, 1], i.e., ffy (x) < oo and ff!. (x) < oo for x G [0, 1] , Vi, j, 
and satisfy AQ\ then the smoothness conditions of Lemma|2] 
are satisfied. The following is a standard result in matrix 
analysis. 

Lemma 22 Let Q(t) G R pxp has entries that are dif- 
ferentiable functions oft G [0, 1]. Assuming that 0(i) is 
always non-singular, then 

Lemma 23 Suppose <d(t) G R pxp has entries that each 
are twice differentiable functions of t. Assuming that 
Q(t) is always non-singular, then 

^ [£(*)] = E(t)D(t)£(t), where 

D{t) = 2 |[e(i)]£(i)|[e(*)] - J^e(t)]. 

Proof: The existence of the second order derivatives for 
entries of E(t) is due to the fact that £(t) and ^[O(t)] 
are both differentiable Vi G [0, 1]; indeed by Lemmal22l 



5^ = 5 



-£(t)-[e(t)]£(t) 



-|[£(t)]|[e(i)]£(t)-£(i) 



d_ 

dt 

dt 2 



:[0(t)]S(t) 



_dr 
[6(f)]£(i) 



£(0 ( 2-[0(t)]S(t)-[e(t)] - ~^[Q(t)] ) £(i), 



hence the lemma holds by the definition of □ 

Let £(x) = (<Tij(x)) ,Vx G [0,1]. Let £(x) = 
(£i(x), £ 2 (a:), . . . , £ P (x)), where £,(x) G i? p denotes 
a column vector. By Lemma [23] 



4(x) = SffxjD^E^a:), 
where 9'(x) = (6^(x)) ,Vx G [0, 1]. 
Lemma 24 Given A\2\and A\3\ Vx G [0, 1], 

Proof: \o> ij (x)\ = \Xf( x )G>(x)Z j (x)\ 

P P 

< 



(32) 
(33) 



max |^ 2 (o:)|^^|^(x)| < S 2 ^S_ 



fe=i 1=1 



We denote the elements of <d(x) by 9jk(x). Let Q' g 
represent a column vector of O'. 

Theorem 25 Given A\2\and A\3\ Vi, j, Vx G [0, 1], 

sup |cr-j-(x)| < 2S*q5i + SqS 2 < °°- 

x6[0,l] 

Proof: By d33l and the triangle inequality, 



\a'l 3 {x)\ = \mx)D{x)^{x) 
p p 

< 

i—i...,p 



max |cr 2 (x)| V y2\D M (x)\ 

— l...,P * ' z * 

fc=l ^=1 

< 5 2 ^f]2|C(^(xK(x)| + |^(x)| 



fc=i £=1 
= 2SqSi + SqS 2 , 

where by A@ ELi ELi KM < ft, and 

fe=i £=1 

fe=l £=1 z=l j=l 

< max^ I^WlEEEEl Wii & I 



fe=l £=1 i=l j=l 



< SqSl □ 



7 Some Implications of a Very Sparse 9 

We use C 1 to denote Lebesgue measure on R. The aim 
of this section is to prove some bounds that correspond 
to A|3] but only for C 1 a.e. x G [0, 1], based on a single 
sparsity assumption on 6 as in A [5] We let E C [0, 1] 
represent the "bad" set with = 0. and C 1 a.e. 



x E [0, 1] refer to points in the set [0, 1] \ E such that 
/^([O, 1] \ E) = 1. When ||6(x)|| < s + p for aU 
x E [0, 1], we immediately obtain Theorem |26l whose 
proof appears in Section l7TTl We like to point out that al- 
though we apply Theorem [26] to 9 and deduce smooth- 
ness of £, we could apply it the other way around. In 
particular, it might be interesting to apply it to the cor- 
relation coefficient matrix (pij), where the diagonal en- 
tries remain invariant. We use 0'(x) and 0"(x) to de- 
note (0^(x)) and (O'^Ax)) respectively Vx. 

A5 Assume that ||0(x)|| o < s + p Vx E [0, 1]. 

A6 35,4, 5*5 < 00 such that 
1 1 1 1 2 

S*4 = max 0L- and S5 = max \\9'L\\ ■ (34) 
j j 1 1 j 1 1 00 y 11 j 11 00 

We state a theorem, the proof of which is in Section l7TI 
and a corollary. 

Theorem 26 Under A\5\ we have \\Q"(x)\\ Q < ||0'(x)|| o 
< ||©(a;)|lo < s+pforC 1 a.e. x E [0,1]. 

Corollary 27 Given A^2\and A\5\ for £} a.e. x E [0,1] 



where for C 1 a.e. x E [0, 1], 



p v p 



k=i e=i 

< max 1 1 o~i 

i—l...,p 



EE WW\ < E EE EIW« I 

k=i e=i i=i j=i 
p p p p 

I00EEEEI 

fc=i e=i i=i j=i 

< So(s+ P ) 2 S 4 

and EaUi J2i=i l&ul - ( s + P)$5- The first inequal- 
ity is due to the following observation: at most (s + p) 2 

elements in the sum of EfeEiE^Ej ^ki( x Wtj( x ) 
for C 1 a.e. x E [0, 1], that is, except for E, are non 
zero, due to the fact that for x E [0, 1] \ N, ||0'(x)|| o < 
ll©( a; )llo — s + p as in Theorem |261 The second in- 
equality is obtained similarly using the fact that for C 1 
a.e. xe [0,1], ||0"(x)|| o < ||0(x)|| o < s + p. a 

Remark 29 For the bad set E C [0, 1] with CHE) = 
0, o~'ij{x) is well defined as shown in Lemma 122] but it 
can only be loosely bounded by 0(p 2 ), as ||0'(x)|| o = 
0(p 2 ), instead of s + p, for x E E; similarly, <7^ (x) can 
only be loosely bounded by 0(p ). 

By Lemma [28] using the Lebesgue integral, we can 
derive the following corollary. 

Corollary 30 Under A]2\A\5\ and A\6\ 

1 

(4 (x)) 2 dx < 2S$S 4 s + p 2 + S$S 5 (s + p) < 00. 

7.1 Proof of Theorem[26j 

Let||0(x)|| o < s+pforallx E [0,1]. 

Lemma 31 Let a function u : [0,1] — > R. Suppose u 
has a derivative on F (finite or not) with CMul^F)) = 0. 
Then u'[x) = for C 1 a.e. x E F. 

Take F = {x E [0, 1] : 6 l3 (x) = 0} and u = %. For 
C 1 a.e. x E F , that is, except for a set Nij of C 1 (Nij) = 
0, 6'^x) = 0. Let N = U i;/ Nij. By LemmaEQ 

Lemma 32 If x E [0,1] \ N, where C l (N) = 0, if 
9ij(x) = 0, then 9[j(x) — Ofor all i, j. 

Proof: By the triangle inequality, for C 1 a.e. x E [0,1], Let v io = 9[ •. Take F = {x E [0,1] : uy(ar) = 0}. 



K(x)\ <S$y/S4(8+p)< OO 



(35) 



Proof: By proof of Lemmal24l 

I4(x)| < max l=1 ..., p \\cf |U ELi ELi 

Hence by Theoreml26l for C 1 a.e. x E [0,1], |o^(x)| < 

maxi =1 ... lP llcrflU ELi ELi 1^0)1 

< Sgmax w ||fl' M || oo ||e'( a: )|| < S 2 Vs;(s+p). □ 

Lemma 28 Under A\5\and®for C 1 a.e. x E [0, 1], 

EEEEK^)WI ^ (s+p) 2 max||^ 
fc=i ^=1 i=i j=i 10 
p p 

E E e ^ - ( s + p ) 1 1 1 1 00 ' hence 

k=l l=\ 3 

ess sup a'!j{x) < 2S${s+p) 2 S 4 + S%(s+p)S 5 . 

x£ [0, 1] 



2 

% 3 1 1 cjo 



|<.(x)| =|EfDE i | 



fc=l £=1 

p p 

^ ^-KL EE i^wi 
fc=i £=1 



fe=l 1=1 

?3/„ 1 „\2 



fc=l l=\ 



2SUs+ P yS i + S 2 (s+p)S 5 , 



For £ 1 a.e. x E F, that is, except for a set with 
£(iVi.) = 0, ^-(x) = 0. Let JVi = (J..^.. By 
Lemma[3Tl 

Lemma 33 If x E [0, 1] \ AT 1; w/iere £ 1 (AT 1 ) = 0, if 
6»^.(x) =0,^en 6»^(x) = 0, Vi, j. 

Thus this allows to conclude that 

Lemma 34 Ifx E [0, l]\iVUJV*i, where ^(NUNi) = 
0, if9ij{x) = 0, then ^(x) = and 9 , { j {x) = 0,Vi,j. 

Thusforallx E [0, l]\iVUA^i, ||9"(x)|| < ||©'(a;)|lo < 

||e(x)|| <(*+p). □ 



8 Examples 

In this section, we demonstrate the effectiveness of the 
method in a simulation. Starting at time t = to, the 
original graph is as shown at the top of Figure 1 . The 
graph evolves according to a type of Erdos-Renyi ran- 
dom graph model. Initially we set G = 0.25/ pxp , where 
p = 50. Then, we randomly select 50 edges and up- 
date as follows: for each new edge a weight 
a > is chosen uniformly at random from [0.1,0.3]; 
we subtract a from 9ij and 9ji, and increase da, 6jj by 
a. This keeps £ positive definite. When we later delete 
an existing edge from the graph, we reverse the above 
procedure with its weight. Weights are assigned to the 
initial 50 edges, and then we change the graph structure 
periodically as follows: Every 200 discrete time steps, 
five existing edges are deleted, and five new edges are 
added. However, for each of the five new edges, a target 
weight is chosen, and the weight on the edge is gradu- 
ally changed over the ensuing 200 time steps in order 
ensure smoothness. Similarly, for each of the five edges 
to be deleted, the weight gradually decays to zero over 
the ensuing 200 time steps. Thus, almost always, there 
are 55 edges in the graph and 10 edges have weights that 
are varying smoothly. 

8.1 Regularization Paths 

We increase the sample size from n = 200, to 400, 
600, and 800 and use a Gaussian kernel with bandwidth 
= 5 n ifi ■ We use the following metrics to evaluate 
model consistency risk for ((3]) and predictive risk Q in 
Figure 1 as the l\ regularization parameter p increases. 

• Let F n denote edges in estimated O„(io) and F 
denote edges in <d(to)- Let us define 



Original Graph 



precision 



recall 



_ F n \F _ F n H F 
F F ' 

F \ F n _ F n n F 



1 - 



F F 
Figure 1 shows how they change with p. 

• Predictive risks in (|4]) are plotted for both the or- 
acle estimator © and empirical estimators |(7} for 
each n. They are indexed with the l\ norm of var- 
ious estimators vectorized; hence | • |i for £„(to) 
and E*(to) are the same along a vertical line. Note 
that |£*(*o)|i < |£(*o)|i,Vp > 0; for every esti- 
mator £ (the oracle or empirical), | £| i decreases as 
p increases, as shown in Figure 1 for | £200(^0) |i- 

Figure 2 shows a subsequence of estimated graphs as p 
increases for sample size n = 200. The original graph 
at to is shown in Figure 1. 

8.2 Chasing the Changes 

Finally, we show how quickly the smoothed estimator 
using GLASSO [FHT07| can include the edges that are 
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Figure 1 : Plots from top to bottom show that as the pe- 
nalization parameter p increases, precision goes up, and 
then down as no edges are predicted in the end. Recall 
goes down as the estimated graphs are missing more and 
more edges. The oracle E* performs the best, given the 

same value for j E„ (t ) 1 1 = |£*|i,Vn. 



Edges 

0.35 0.4875 0.52 0.5275 0.6125 




Figure 3: There are 400 discrete steps in [0, 1] such that the edge set F(t) remains unchanged before or after t = 0.5. 
This sequence of plots shows the times at which each of the new edges added at t = appears in the estimated graph 
(top row), and the times at which each of the old edges being replaced is removed from the estimated graph (bottom 
row), where the weight decreases from a positive value in [0.1, 0.3] to zero during the time interval [0, 0.5]. Solid and 
dashed lines denote new and old edges respectively. 



being added in the beginning of interval [0,1], and get 
rid of edges being replaced, whose weights start to de- 
crease at x — and become at x = 0.5 in Figure 3. 

9 Conclusions and Extensions 

We have shown that if the covariance changes smoothly 
over time, then minimizing an i'l-penalized kernel risk 
function leads to good estimates of the covariance ma- 
trix. This, in turn, allows estimation of time varying 
graphical structure. The method is easy to apply and is 
feasible in high dimensions. 

We are currently addressing several extensions to 
this work. First, with stronger conditions we expect that 
we can establish sparsistency , that is, we recover the 
edges with probability approaching one. Second, we can 
relax the smoothness assumption using nonparametric 
changepoint methods [GH02| which allow for jumps. 
Third, we used a very simple time series model; exten- 
sions to more general time series models are certainly 
feasible. 
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A Large Deviation Inequalities for 
Boxcar Kernel Function 

In this section, we prove the following lemma, which 
implies the i.i.d case as in the corollary. 

Lemma 35 Using a boxcar kernel that weighs uniformly 
over n samples Zk ~ N(0, S(fc)), k = 1, . . . , n, that 
are independently but not identically distributed, we have 
for e small enough, for some c 2 > 0, 



-c 2 ne } 



P[\S n (t,i,j)--ES n (t,i,j)\ >e <exp{ 



Corollary 36 For the i.i.d. case, for some C3 > 0, 
p(\S n (i,j)-ES n {i,j)\ >e) <exp{-c 3 ne 2 } 



Lemmal35"1is implied by Lemma[37lfor diagonal entries, 
and Lemma |38l for non-diagonal entries. 



G(p,F n ) G(p,F n \F) G(p,F\F n ) 








Figure 2: n = 200 and h = 1 with p = 0.14, 0.2, 0.24 
indexing each row. The three columns show sets of 
edges in F n , extra edges, and missing edges with respect 
to the true graph G(p, F). This array of plots show that 
l\ regularization is effective in selecting the subset of 
edges in the true model 0(io), even wnen the samples 
before to were from graphs that evolved over time. 



A.l Inequalities for Squared Sum of Independent 
Normals with Changing Variances 

Throughout this section, we use a 2 as a shorthand for an 
as before. Hence a 2 (x k ) = Var (Z ky i) = au(x k ),yk = 
1, . . . , n. Ignoring the bias term as in (1291 . we wish to 
show that each of the diagonal entries of is close to 
of (xo),Vi — 1, . . . ,p. For a boxcar kernel that weighs 
uniformly over n samples, we mean strictly £ k (xo) — 
— , Vfc = 1, . . . , n, and h = 1 for d28l > in this context. 
We omit the mention of i or t in all symbols from here 
on. The following lemma might be of its independent 
interest; hence we include it here. We omit the proof 
due to its similarity to that of Lemma [13] 

Lemma 37 We let zi, ... , z n represent a sequence of 
independent Gaussian random variables such that z k ~ 
N(0,a 2 (x k )). Let a 2 = i £)£ =1 (J 2 {x k ). Using a 
boxcar kernel that weighs uniformly over n samples, 
Ve < co 2 , for some c> 2, we have 



1 n 

-E- 

n ^— ' 

k=l 



> e < exp 



-(3c — 5)ne^ 
ic 2 a 2 cF 2 aayi 



where cr 2 lax = max k =i t ... tn {a 2 (x k )}. 

A.2 Inequalities for Independent Sum of Products 
of Correlated Normals 

The proof of Lemma[38lfollows that of Lemma [T5l 



Lemma 38 Let * 2 = £ ELi 1 2 1 

and C4 = 2 Q% 2 - Using a boxcar kernel that weighs uni- 
formly over n samples, for e < maXfc(CTi( * 2 fc)CTj(Xfc)) , 

P (\S n (t,i,j) - ES n (t,i,j)\ > e) < exp{~c 4 ne 2 } . 

B Proofs for Large Deviation Inequalities 
B.l Proof of Claim[l6] 

We show one inequality; the other one is bounded sim- 
ilarly. Vfc, we compare the k th elements &2,k, §4,k that 
appear in the sum for $2 and $4 respectively: 

$4, fe _ (ai + bi)±t 2 



$2, fc (a 2 + b 2 )4t± 
( 2 / x k - x Q 
= (h K [—h— 



o-i(x k )oj(x k ) 
2((l + Pt] {x k )) A + {\~ Pt] {x k )f 



8(1+4.(3!*)) 

< max ^j-K (^ Xk h X ° ^j Vi(xk)°~j{xk) 

(l+p) 4 + (l-p) 4 2 
max — — ■ — — = 2M . □ 

o<p<i 4(1 + p 2 ) 

B.2 Proof of Lemma Q3] 

We first use the Taylor expansions to obtain: 



m(i - 0fc ) = -o fc - ^ - ^ - ^ - y M, 

v m 2 3 4 ^ l , 

1=5 

where, 



00 c„ ^ 
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K) / 1 



< 



i - 5 

2=5 1=5 

for a k < 1/2; Similarly, 

oc 

In (! + &*) = £ 



5(1 - a*;) 



" .(-i)'- 1 ^)' 

71=1 



where 



(=4 Z=5 

Hence for b k < a k < i , Vfc, 

1 ™ 1 

2^ ^(1-^(1 + 60 

- 2—i 9 4 



fc=i 



(i 



5 8 



9 



nt^x + nt 2 $ 2 + nt 3 $ 3 + -ni 4 <J> 4 . □ 

5 



B.3 Proof of Claim [18] 

We replace the sum with the Riemann integral, and then 
use Taylor's Formula to replace <Ji(xk), <7j(xk), and &ij (Xk), 

= 1 E^ 2 ( X -^) K(^)a|K) + 4(x fc )) 



n ^— ' h 2 \ h 

fe=i v 

" 1^ (^) WWW + 4 dw 
2 f° 

— / K 2 (v) (of (x + hv)a 2 (x + hv) + er 2 -(x + /if)) 
2 ^^(,)^(, ) + ^(,o) + ^ (m)(H2VJ 
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' cr''(y 2 )(H 2 \ 2 
o-j(a;o) + hvajixa) H 



1 cry (xq ) + hva^ (x ) H ] aw 

2 f 



K» ((1 + 4-(a;o))a?(xo)^(xo)) dt> 
-l 
o 

2/ 



C 2 y ^(u)d« + 0(/i) 



where yo,yi,y2 < /if + 2:0 and C±, C2 are some con- 
stants chosen so that all equalities hold. □ 



